AITopics | target 0

Collaborating Authors

target 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constrained Entropic Unlearning: APrimal-Dual Framework for Large Language Models

Neural Information Processing SystemsJun-23-2026, 02:39:47 GMT

Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmaxfree, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable, all without any extra computational overhead. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds stateof-the-art baselines, effectively removing targeted information while preserving downstream utility.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Kuwait (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (0.66)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Bronzini, Marco, Nicolini, Carlo, Lepri, Bruno, Staiano, Jacopo, Passerini, Andrea

arXiv.org Artificial IntelligenceDec-3-2025

Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods either focus on input-oriented feature extraction, such as supervised probes and Sparse Autoencoders (SAEs), or on output distribution inspection, such as logit-oriented approaches. A full understanding of LLM vector spaces, however, requires integrating both perspectives, something existing approaches struggle with due to constraints on latent feature definitions. We introduce the Hyperdimensional Probe, a hybrid supervised probe that combines symbolic representations with neural probing. Leveraging Vector Symbolic Architectures (VSAs) and hypervector algebra, it unifies prior methods: the top-down interpretability of supervised probes, SAE's sparsity-driven proxy space, and output-oriented logit investigation. This allows deeper input-focused feature extraction while supporting output-oriented investigation. Our experiments show that our method consistently extracts meaningful concepts across LLMs, embedding sizes, and setups, uncovering concept-driven patterns in analogy-oriented inference and QA-focused text generation. By supporting joint input-output analysis, this work advances semantic understanding of neural representations while unifying the complementary perspectives of prior methods.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2509.25045

Country: Europe > Italy (0.29)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

Entesari, Taha, Hatami, Arman, Khaziev, Rinat, Ramakrishna, Anil, Fazlyab, Mahyar

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmax-free, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable, all without any extra computational overhead. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds state-of-the-art baselines, effectively removing targeted information while preserving downstream utility.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.05314

Country: Asia > Middle East (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.92)
Law (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

A Machine Learning Approach Capturing Hidden Parameters in Autonomous Thin-Film Deposition

Zheng, Yuanlong, Blake, Connor, Mravac, Layla, Zhang, Fengxue, Chen, Yuxin, Yang, Shuolong

arXiv.org Artificial IntelligenceNov-27-2024

The integration of machine learning and robotics into thin film deposition is transforming material discovery and optimization. However, challenges remain in achieving a fully autonomous cycle of deposition, characterization, and decision-making. Additionally, the inherent sensitivity of thin film growth to hidden parameters such as substrate conditions and chamber conditions can compromise the performance of machine learning models. In this work, we demonstrate a fully autonomous physical vapor deposition system that combines in-situ optical spectroscopy, a high-throughput robotic sample handling system, and Gaussian Process Regression models. By employing a calibration layer to account for hidden parameter variations and an active learning algorithm to optimize the exploration of the parameter space, the system fabricates silver thin films with optical reflected power ratios within 2.5% of the target in an average of 2.3 attempts. This approach significantly reduces the time and labor required for thin film deposition, showcasing the potential of machine learning-driven automation in accelerating material development.

artificial intelligence, machine learning, wavelength, (17 more...)

arXiv.org Artificial Intelligence

2411.18721

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

Zhong, Chengzhi, Cheng, Fei, Liu, Qianying, Jiang, Junfeng, Wan, Zhen, Chu, Chenhui, Murawaki, Yugo, Kurohashi, Sadao

arXiv.org Artificial IntelligenceAug-20-2024

In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal $\textbf{latent languages}$. We examine the latent language of three typical categories of models for Japanese processing: Llama2, an English-centric model; Swallow, an English-centric model with continued pre-training in Japanese; and LLM-jp, a model pre-trained on balanced English and Japanese corpora. Our empirical findings reveal that, unlike Llama2 which relies exclusively on English as the internal latent language, Japanese-specific Swallow and LLM-jp employ both Japanese and English, exhibiting dual internal latent languages. For any given target language, the model preferentially activates the latent language most closely related to it. In addition, we explore how intermediate layers respond to questions involving cultural conflicts between latent internal and target output languages. We further explore how the language identity shifts across layers while keeping consistent semantic meaning reflected in the intermediate layer representations. This study deepens the understanding of non-English-centric large language models, highlighting the intricate dynamics of language representation within their intermediate layers.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2408.10811

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Cell Morphology-Guided Small Molecule Generation with GFlowNets

Lu, Stephen Zhewen, Lu, Ziqing, Hajiramezanali, Ehsan, Biancalani, Tommaso, Bengio, Yoshua, Scalia, Gabriele, Koziarski, Michał

arXiv.org Artificial IntelligenceAug-9-2024

High-content phenotypic screening, including high-content imaging (HCI), has gained popularity in the last few years for its ability to characterize novel therapeutics without prior knowledge of the protein target. When combined with deep learning techniques to predict and represent molecular-phenotype interactions, these advancements hold the potential to significantly accelerate and enhance drug discovery applications. This work focuses on the novel task of HCI-guided molecular design. Generative models for molecule design could be guided by HCI data, for example with a supervised model that links molecules to phenotypes of interest as a reward function. However, limited labeled data, combined with the high-dimensional readouts, can make training these methods challenging and impractical. We consider an alternative approach in which we leverage an unsupervised multimodal joint embedding to define a latent similarity as a reward for GFlowNets. The proposed model learns to generate new molecules that could produce phenotypic effects similar to those of the given image target, without relying on pre-annotated phenotypic labels. We demonstrate that the proposed method generates molecules with high morphological and structural similarity to the target, increasing the likelihood of similar biological activity, as confirmed by an independent oracle model.

arXiv.org Artificial Intelligence

2408.05196

Country:

North America > Canada > Quebec > Montreal (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Newell's theory based feature transformations for spatio-temporal traffic prediction

Sengupta, Agnimitra, Guler, S. Ilgin

arXiv.org Artificial IntelligenceJul-16-2023

Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell's uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models' performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2307.05949

Country:

North America > United States > California (0.06)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Consumer Products & Services > Travel (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Diffusion-based Time Series Imputation and Forecasting with Structured State Space Models

Alcaraz, Juan Miguel Lopez, Strodthoff, Nils

arXiv.org Artificial IntelligenceMay-6-2023

The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.

imputation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2208.09399

Country:

North America > United States > California (0.04)
North America > United States > Alabama (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Energy (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From STL Rulebooks to Rewards

Aguilar, Edgar A., Berducci, Luigi, Brunnbauer, Axel, Grosu, Radu, Ničković, Dejan

arXiv.org Artificial IntelligenceOct-6-2021

The automatic synthesis of neural-network controllers for autonomous agents through reinforcement learning has to simultaneously optimize many, possibly conflicting, objectives of various importance. This multi-objective optimization task is reflected in the shape of the reward function, which is most often the result of an ad-hoc and crafty-like activity. In this paper we propose a principled approach to shaping rewards for reinforcement learning from multiple objectives that are given as a partially-ordered set of signal-temporal-logic (STL) rules. To this end, we first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements. We then develop a method for systematically combining evaluations of multiple requirements into a single reward that takes into account the priorities defined by the partial order. We finally evaluate our approach on several case studies, demonstrating its practical applicability.

lander, obstacle, requirement, (17 more...)

arXiv.org Artificial Intelligence

2110.02792

Country: